A Multi-Aspect Comparison and Evaluation on Thai Word Segmentation Programs

نویسندگان

  • Chaluemwut Noyunsan
  • Choochart Haruechaiyasak
  • Seksan Poltree
  • Kanda Runapongsa Saikaew
چکیده

Word segmentation is an important task in natural language processing, especially for languages without word boundaries, such as Thai language. Many Thai word segmentation programs have been developed. Researchers and developers in Thai documents usually spend a tremendous amount of time in studying and trying different Thai word segmentation programs. This paper presents the performance of six Thai word segmentation programs which include Libthai, Swath, Wordcut, CRF++, Thaisemantics, and Tlexs. Based on experimental results, we compare these programs in terms of usage, response time, time outs, and relevance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thai Word Segmentation Verification Tool

Since Thai has no explicit word boundary, word segmentation is the first thing to do before developing any Thai NLP applications. In order to create large Thai word-segmented corpora to train a word segmentation model, an efficient verification tool is needed to help linguists work more conveniently to check the accuracy and consistency of the corpora. This paper proposes Thai Word Segmentation...

متن کامل

A Lexicalized Tree Adjoining Grammar for Thai

This paper describes an alternative formalism for Thai syntax parsing based on a lexicalized tree adjoining grammar (LTAG). We first briefly present some formal background concerning LTAG, which is necessary for an understanding of LTAG and its application to Thai. Specifically, we address several issues regarding difficulties in parsing Thai sentences and how to resolve these issues using LTAG...

متن کامل

Word Segmentation in Indo-China Languages for Digital Libraries

This chapter introduces word segmentation methods for Indo-China languages. It describes six different word segmentation methods developed for the Thai, Vietnamese, and Myanmar languages and compare different approaches in terms of their algorithms and results achieved. The discussion and comparison of these word segmentation methods will provide underlying views about how word segmentation can...

متن کامل

Dictionary-based Thai CLIR: Experimental Survey of Thai CLIR

This paper describes our work, which participated in the Cross-Language Information Retrieval (CLIR) at the Cross-Language Evaluation Forum. Our objectives for this experiment have three folds. Firstly, the coverage of the Thai-bilingual dictionary was evaluated when translating queries. Secondly, whether the segmentation process has effected the CLIR. Lastly, this research investigates the que...

متن کامل

Thoughts on Word and Sentence Segmentation in Thai

This paper discusses problems of word and sentence segmentation in Thai. Disagreements on word segmentation are caused mostly from compound words. To set a standard resource and tool of word segmentation, we suggest that only simple words and true compound words should be segmented in the process of word segmentation. Other compounds can be grouped later by the same means as multiword identific...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014